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by 

Nozer  D.  Singpurwalla  and  Simon  P .  Wilson 
The  George  Washington  University,  Washington,  D.C.  20052 


Abstract 


A  formal  approach  for  evaluating  the  reliability  of  computer  software  is  through  probabilistic  models 
and  statistical  methods.  This  paper  is  an  expository  overview  of  the  literature  on  the  former.  The 
various  probability  models  for  software  failure  can  be  classified  into  two  groups;  the  merits  of  these 
groups  are  discussed  and  an  example  of  their  use  in  decision  problems  is  given  in  some  detail.  The 
direction  of  current  and  future  research  is  contemplated. 


1.  Introduction. 


Having  been  developed  over  the  last  20  years,  software  reliability  is  a  relatively  new  area  of  research  for 
the  statistics  and  the  computer  science  communities.  It  arose  because  of  interest  in  trying  to  predict  the 
reliability  of  software,  particularly  when  its  failure  could  be  catastrophic.  Obviously  the  software  that 
controls  an  aircraft  carrier,  a  nuclear  power  station,  a  submarine  or  a  life-support  machine  needs  to  be 
very  reliable,  and  statistical  techniques  will  aid  the  computer  scientist  in  deciding  if  such  software  has 
sufficient  reliability.  The  subject  is  also  of  commercial  importance,  as  for  example  when  decisions  have 
to  be  made  concerning  the  release  of  software  into  the  marketplace. 

All  software  is  subject  to  failure,  due  to  the  inevitable  presence  of  errors  (or  bugs)  in  the  code,  so  the 
first  aim  of  the  subject  has  been  to  develop  models  that  describe  software  failure.  There  are  various 
methods  of  specifying  such  failure  models,  and  Section  2  discusses  these  in  some  detail.  It  is  fair  to  say 
that  this  model  derivation  has  been  the  focus  of  research  so  far.  Once  a  failure  model  has  been  specified 
then  it  can  be  applied  to  problems  such  as  the  optimal  time  to  debug  software  or  deciding  whether 
software  is  ready  for  release.  These  applications  have  received  less  attention  in  the  literature  but  are 
becoming  more  prevelant.  We  will  mention  here  that  there  is  another  approach  to  software  reliability 
that  differs  considerably  from  the  statistical  ideas  presented  here.  This  approach  attempts  to  prove  the 
reliability,  or  correctness,  of  software  by  formal  means  of  proof,  just  as  one  would  prove  a 
mathematical  theorem.  This  is  an  exercise  in  logic,  albeit  a  rather  complex  one.  It  works  well  on  small 
programs,  for  example  on  a  program  that  computes  the  factorial  function,  but  becomes  a  forbidding 
task  for  even  moderately  complex  pieces  of  code.  Nevertheless,  the  idea  that  software  can  be  proved 
correct  is  appealing.  The  approach  is  not  discussed  further. 

This  paper  is  divided  into  5  further  sections.  Section  2  categorizes  the  different  strategies  that  have 
been  used  to  model  software  failure.  Section  3  reviews  the  historical  development  of  the  subject  by 
describing  some  of  the  more  commonly  used  models,  and  Section  4  shows  that  many  of  these  models 
can  be  unified  if  one  adopts  a  Bayesian  position.  Section  5  looks  at  applications  of  the  material 
developed  in  Section  3,  and  Section  6  concludes  with  a  look  at  the  current  and  future  direction  of  the 
subject.  We  assume  that  the  reader  has  some  familiarity  with  some  basic  reliability  and  probability 
concepts;  in  particular  it  is  important  that  he  or  she  has  knowledge  of  some  common  probability 
distributions,  statistical  inference  and  decision  making,  Poisson  processes  and  the  concept  of  a  failure 

rate. 
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2.  Model  Categorization 


All  statistical  software  reliability  models  are  probabilistic  in  nature.  They  attempt  to  specify  the 
probability  of  software  failure  in  some  manner.  In  looking  through  the  literature,  one  observes  that  the 
models  developed  so  far  can  be  broadly  classified  into  two  categories 

Type  h  Those  which  propose  a  probability  model  for  times  between  successive  failure  of  the 
software,  and 

T  ype  II:  Those  which  propose  a  probability  model  for  the  number  of  failures  up  to  a  certain  time. 

Time  is  often  taken  to  be  CPU  time,  or  the  amount  of  time  that  the  software  is  actually  running,  as 
opposed  to  real  time.  In  theory,  specification  via  one  of  these  two  methods  enables  one  to  specify  the 
other.  So  a  model  that  specifies  time  between  failure  will  also  be  able  to  tell  you  the  number  of  failures 
in  a  given  time,  and  vice  versa.  In  practice,  this  may  not  be  straightforward. 


The  first  of  these  categories,  modeling  time  between  failure,  is  most  commonly  accomplished  via  a 
specification  of  the  failure  rate  of  the  software  as  it  is  running.  When  this  is  the  case  the  model  is  to  be 
of  Type  1-1.  The  failure  rate  for  the  i-th  time  between  failure  is  given,  for  i— 1,  2,  3  ,...  and  a 
probability  model  results.  One  distinctive  feature  of  software  is  that  its  failure  rate  may  decrease  with 
time,  as  more  bugs  are  discovered  and  corrected.  This  contrasts  with  most  mechanical  systems  which 
will  age  over  time  and  so  have  an  increasing  failure  rate.  An  attempt  to  debug  software  may  introduce 
more  bugs  into  it,  thus  tending  to  increase  the  failure  rate,  so  the  decreasing  failure  rate  assumption  is 
somewhat  idealized.  However,  most  of  the  models  of  this  type  that  are  reviewed  here  have  a  decreasing 
failure  rate. 

Another  way  to  model  time  between  failure  is  to  define  a  stochastic  relationship  between  successive 
failure  times.  Models  that  are  specified  by  this  method  are  known  as  Type  1-2,  and  hav,  the 
advantage  over  Type  1-1  in  that  they  model  the  times  between  failure  directly,  and  not  via  the  abstract 
concept  of  a  failure  rate.  For  example,  let  Tj,  T2,  Tj,  ...  be  the  length  of  times  between  successive 
failure  of  the  software.  As  a  simple  case,  one  could  declare  that  Ti+1  =  pT;  +  q,  where  p  >  0  is  a 
constant  and  c;  is  an  error  term  (typically  some  random  variable  with  mean  0).  Then  p<  1  would 
indicate  decreasing  times  between  failure  (software  reliability  expected  to  become  worse),  p—  1  would 
indicate  no  expected  change  in  software  reliability  whilst  p>  1  indicates  increasing  times  between  failure 
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(software  reliability  expected  to  improve).  Those  familiar  with  time  series  will  recognize  the 
relationship  in  this  example  as  an  auto-regressive  process  of  order  1;  m  general,  one  would  say  Ti+1= 
f(T1,T2,...,Ti)  +  fj  for  some  function  f. 

The  second  of  these  categories,  modeling  the  number  of  failures,  uses  a  point  process  to  count  the 
failures.  Let  M(t)  be  the  number  of  failures  of  the  software  that  are  observed  during  time  [0,t).  M(t)  is 
modeled  by  a  Poisson  process ,  which  is  a  stochastic  process  with  the  following  properties: 


i)  M(0)  =  0  and  if  s<t  then  M(s)  <  M(t).  M(t)  takes  values  in  {0,  1,2,..  .} 

ii)  The  number  of  failures  that  occur  in  disjoint  time  intervals  are  independent.  So,  for  example,  the 
number  of  failures  in  the  first  5  hours  of  use  has  no  effect  on  the  number  of  failures  m  the  next  5 

hours. 

iii)  The  number  of  failures  to  time  t  is  a  Poisson  random  variable  with  mean  /i(t),  for  some  non- 
decreasing  function  /i(t);  that  is  to  say: 


P(M(t)=n) 


(P(t))n  -P(t) 

n! 


n=0,  1,  2,... 


The  different  models  of  this  type  have  a  different  function  p(t),  which  is  called  the  mean  value 
function.  The  mean  number  of  failures  at  time  t  is  indeed  /z(t),  as  is  the  variance.  The  Poisson 
process  is  chosen  because  in  many  ways  it  is  the  simplest  point  process  yet  it  is  flexible  and  has  many 
useful  properties  that  can  be  exploited.  This  second  approach  has  become  increasingly  popular  in  recent 
years.  M(t)  can  also  be  specified  by  its  intensity  function  A(t),  which  is  the  derivative  of  /i(t)  with 
respect  to  t;  either  of  these  functions  completely  specify  a  particular  Poisson  process.  One  disadvantage 
of  this  approach  is  that  it  implies  that  there  are  conceptually  an  infinite  number  of  bugs  in  the 
program,  which  is  obviously  impossible  for  code  of  a  finite  length.  Another  disadvantage  is  more 
subtle;  the  model  implies  a  positive  correlation  between  the  number  of  failures  m  adjoining  time 
intervals,  a  situation  which  is  not  true  since  again  the  total  number  of  bugs  has  to  be  finite. 


Figure  1  is  a  flow-chart  showing  the  above  categorization  of  the  statistical  models. 
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Figure  1.  Categorization  of  Software  Reliability  Models. 
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3.  Review  of  Some  Software  Reliability  Models 


This  section  introduces  some  of  the  more  well  known  probability  models  for  software  reliability.  There 
are  examples  of  models  from  each  of  the  two  main  categories  that  were  discussed  in  the  previous 
section.  Since  the  main  purpose  of  the  review  is  to  describe  the  ideas  and  assumptions  behind  the 
models,  technical  details  will  be  kept  to  a  minimum  in  most  cases.  Those  interested  in  the  details  of  a 
particular  model  are  advised  to  reference  the  papers  where  they  were  originally  presented. 

Some  common  notation  will  be  assumed  throughout  this  section  and  is  given  below  : 

i-th  time  between  failure  of  the  software  [i.e.time  between  (i-l)th  and  i-th  failure], 
failure  rate  for  Tit  the  i-th  time  between  failure,  at  time  t. 

number  of  failures  of  the  software  in  the  time  interval  [0,  t)  (a  Poisson  process), 
intensity  function  of  M(t). 

expected  number  of  failures  of  software  in  time  [0,t). 
t 

=  |  A(s)  ds  ,  since  M(t)  is  a  Poisson  process. 

0 

10  models  are  presented.  Model  numbers  1  to  7  are  of  Type  1-1,  models  8  and  9  are  of  Type  II  and 
model  10  is  of  type  1-2.  A  common  problem  to  all  the  models  is  the  lack  of  data  on  which  to  test  their 
validity;  data  on  software  reliability  is  commercially  sensitive  and  so  statisticians  in  academia  have 
very  little  information  on  how  software  in  the  marketplace  actually  performs.  For  this  reason  it  is 
important  that  the  assumptions  made  in  deriving  these  models  are  carefully  thought  about. 

1.  The  model  of  Jelinski  h.  Moranda  (1972). 

This  was  the  very  first  software  reliability  model  that  was  proposed,  and  has  formed  the  basis  for  many 
models  developed  after.  It  is  a  Type  1-1  model;  it  models  times  between  failure  by  considering  their 
failure  rates.  Jelinski  and  Moranda  reasoned  as  follows.  Suppose  that  the  total  number  of  bugs  in  the 
program  is  N,  and  suppose  that  each  time  the  software  fails,  one  bug  is  corrected.  The  failure  rate  of 
the  i-th  time  between  failure,  T;,  is  then  assumed  a  constant  proportional  to  N-i+1,  which  is  the 
number  of  bugs  remaining  in  the  program.  In  other  words 


i)  Tj  = 

ii)  rT.(t)  = 

iii)  M(t)  = 

iv)  A(t) 

v)  /i(t)  = 
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rT  (t  |  N,  A)  =  A  (N-i+1),  i=l,  2,  3,  t  >  0,  for  some  constant  A. 

There  are  some  criticisms  that  one  could  make  of  the  model.  It  assumes  that  each  error  contributes  the 
same  amount  A  to  the  failure  rate,  whereas  in  reality  different  bugs  will  have  different  effects.  It  also 
assumes  that  every  time  a  fix  is  made,  no  new  bugs  are  introduced;  note  [see  Figure  2(i)]  that  the 
successive  failure  rates  are  indeed  decreasing.  A  model  like  this  is  sometimes  refered  to  as  a  ’’de¬ 
eutrophication  model”,  because  the  process  of  removing  bugs  from  software  is  akin  to  the  removal  of 
pollutants  in  rivers  and  lakes. 

2.  Bayesian  Reliability  Growth  Model  (Littlewood  k  Verall  (1973)). 

Like  the  Jelinski  k  Moranda  model,  the  model  proposed  by  Littlewood  and  Verall  looked  at  times 
between  failure  of  the  software.  However,  they  did  not  develop  the  model  by  characterizing  the  failure 
rate;  rather  they  stated  that  the  model  should  not  be  based  on  fault  content  (as  Jelinski  k  Moranda 
had  assumed)  and  then  declared  that  T;  has  an  exponential  distribution  with  scale  A;,  and  that  A,  itself 
has  a  gamma  distribution  with  shape  a  and  scale  $(i),  for  some  function  'f.  Despite  this  it  is  still 
considered  to  be  a  Type  1-1  model. 


Specifically  : 


fT.(t  I  A,)  =  A; 


Ha.(A  |  a,  *(i))  =  A"'1  e‘*(l)A 


t  >  0 

A  >  0 


¥(i)  was  supposed  to  describe  the  quality  of  the  programmer  and  the  programming  task.  As  an 
example,  they  chose  $(i)=/30+/?1i.  One  can  show  that  this  makes  the  failure  rate  of  each  T;  decreasing 
in  t  and  that  each  time  a  bug  is  discovered  and  fixed  there  is  a  downward  jump  in  the  successive 
failure  rates;  see  Figure  2(ii).  In  fact 


rT.(t  I  ft,  ft)  “  pQ+px i+t  ’  f°F  1  -  °’ 

If  px>l  then  the  jumps  in  the  failure  rate  decrease  in  i,  if  /?X<1  they  increase  whilst  if  Px-l  they 
remain  a  constant.  So  if  /?x  differs  from  1  then  the  fixing  of  each  bug  is  making  a  different  contribution 
to  the  reduction  in  the  failure  rate  of  the  software,  an  apparent  advantage  over  the  model  by  Jelinski  k 
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Moranda.  This  model  has  received  quite  a  lot  of  attention  and  has  been  the  subject  of  various 
modifications;  see  models  6  and  7  later  in  this  section. 


3.  The  De-eutrophication  model  of  Moranda  (1975). 

Another  (de-eutrophication)  model  of  Moranda  (1975)  attempted  to  answer  some  of  the  criticisms  of 
the  Jelinski  k  Moranda  model,  in  particular  the  criticism  concerning  the  equal  effect  that  each  bug  in 
the  code  has  on  the  failure  rate.  He  hypothesized  that  the  fixing  of  bugs  that  cause  early  failures  in  the 
system  reduces  the  failure  rate  more  than  the  fixing  of  bugs  that  occur  later,  because  these  early  bugs 
are  more  likely  to  be  the  bigger  ones.  With  this  in  mind,  he  proposed  that  the  failure  rate  should 
remain  constant  for  each  T;,  but  that  it  should  be  made  to  decrease  geometrically  in  i  after  each  failure 
i.e.  for  constants  D  and  k 

rT  (t  |  D,  k)  =  D  k*'1  t  >  0,  D>0  and  0<k<l. 

Compared  to  the  Jelinski  k  Moranda  model,  where  the  drop  in  failure  rate  after  each  failure  was 
always  A,  the  drop  in  failure  rate  here  after  the  i-th  failure  is  D  k1'1(l-k)  see  Figure  2(iii).  The 
assumption  of  a  perfect  fix,  with  no  introduction  of  new  bugs  during  the  fix,  is  retained. 

4.  Imperfect  Debugging  Model  (Goel  k  Okumoto  (1978)). 

This  model  is  another  generalization  of  the  Jelinski  <$£  Moranda  model  which  attempts  to  address  the 
criticism  that  a  perfect  fix  of  a  bug  does  not  always  occur.  Goel  k  Okumoto’s  Imperfect  Debugging 
Model  is  like  the  Jelinski  k  Moranda  model,  but  assumes  that  there  is  a  probability  p,  0  <  p  <  1,  of 
fixing  a  bug  when  it  is  encountered.  This  means  that  after  i  faults  have  been  found,  we  expect  i  x  p 
faults  to  have  been  corrected,  instead  of  i.  Thus  the  failure  rate  of  Tj  is 

rT  (t  |  N,  A,  p)  =  A  (  N-p(i-l)  ) 

I 

When  p=l  we  get  the  Jelinski  k  Moranda  model. 
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5.  A  model  by  Schick  k  Wolverton  (1978). 


This  is  yet  another  Type  I  model,  and  this  time  the  failure  rate  is  assumed  proportional  to  the  number 
of  bugs  remaining  in  the  system  and  the  time  elapsed  since  the  last  failure.  Thus 

rT(t  |  N,  A)  =  A  (N-i+l)t,  t>0 

1  i 

This  model  differs  from  models  1-4  in  that  the  failure  rate  does  not  decrease  monotonically. 
Immediately  after  the  i-th  failure,  the  failure  rate  drops  to  0,  and  then  increases  linearly  with  slope 
(N-i)  until  the  (i-f-l)th  failure;  see  Figure  2(iv). 

6.  Bayesian  Differential  Debugging  Model  (Littlewood  (1980)). 

This  model  can  be  considered  as  an  elaboration  of  model  2  proposed  by  Littlewood  k  Verall.  Recall 
that  in  model  2  it  was  assumed  that  A;,  the  failure  rate  of  the  i-th  time  between  failures,  was  declared 
to  have  a  gamma  distribution.  In  this  new  model  Littlewood  supposed  that  there  were  N  bugs  in  the 
system  (a  return  to  the  bug  counting  phenomenon),  and  then  proposed  that  Ai  be  specified  as  a 
function  of  the  remaining  bugs.  In  particular,  he  stated  Aj  =  <t>x  +  <t>2  +  -  +  <t> N-i-  where  were 
independent  and  identically  distributed  gamma  random  variables  with  shape  a  and  scale  0.  This 
implied  that  A;  would  have  a  gamma  distribution  with  shape  or(N-i)  and  scale  0.  In  other  respects  its 
assumptions  are  identical  to  the  original  Littlewood/Verall  model. 

7.  Bayes  Empirical  Bayes  or  Hierarchical  Model  (Mazzuchi  k  Soyer  (1988)). 

In  1988  Mazzuchi  and  Soyer  proposed  a  Bayes  Empirical  Bayes  or  Hierarchical  extension  to  the 
Littlewood  k  Verall  model  (model  2).  As  with  the  original  model,  they  assumed  T;  to  be  exponentially 
distributed  with  scale  A;.  Then  they  proposed  two  ideas  for  describing  A;,  here  called  model  A  and 

model  B. 

Model  A  : 

Still  assume  that  A;  is  described  by  a  gamma  distribution,  but  with  parameters  o  and  0.  Now  assume 
that  a  and  0  are  independent  and  that  they  themselves  are  described  by  probability  distributions;  a  by 
a  uniform  and  0  by  another  gamma.  In  other  words  : 
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A  >  0 


%,<*  i  «• « =  nij  A“'‘  e'i3i' 

?r(a  |  i/)  =  p  , 

tt(/3  |  a,  b)  =  Z?4-1  e'b/?,  0  >  0,  a>0,  b>0. 


Model  B: 

Assume  that  A;  is  described  exactly  as  in  Little  wood  and  Verall  i.e. 


/m  -  ml  A-l 


A  >  0 


nAi(A  I  «,  m  =  ^ 

and  that  V(i)=0o+0li,  except  now  place  probability  distributions  on  a,  0Q  and  0X  as  follows: 


ir(a  |  <j)  — 


\ 

U)l 


0  <  a  <  ui 


*(0Q  |  a,  b,  0X)  =  (0o+0x)a'1  e  ^o+^i)  , 


<0x  I c-  d)  =  I3!'1  e 


-d/?, 


0Q>-0v  a>°>  b>0 
0X  >  0,  c>0,  d>0. 


So  a  is  described  by  a  uniform  distribution,  0O  by  a  shifted  gamma  and  0X  by  another  gamma,  and 
there  is  dependence  between  0O  and  0X.  By  assuming  0X  to  be  degenerate  at  0,  model  A  is  obtained 
from  model  B.  The  authors  were  able  to  find  an  approximation  to  the  expectation  of  Tn+1  given  that 
T  T2=t2,  ...,  Tn=tn,  and  so  use  their  model  to  predict  future  reliability  of  the  software  in  light  of 
the  previous  failure  times. 


8.  Time-dependent  Error  Detection  Model  (Goel  k  Okomuto  (1979)). 

This  is  the  first  Type  II  model  that  we  will  consider.  It  assumes  that  M(t),  the  number  of  failures  of 
the  software  in  time  [0,t),  is  described  by  a  Poisson  process  with  intensity  function  given  by 


A(t)  =  ab  e*bt 

where  a  is  the  total  expected  number  of  bugs  in  the  system  and  b  is  the  fault  detection  rate;  see  Figure 
2(v).  Thus  the  expected  number  of  failures  to  time  t  is  : 
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t 

t)  =  |  ab  e"bs  ds  =  a  (1  -  e'bt). 

0 

The  function  /i(t)  completely  specifies  a  particular  Poisson  process,  and  the  distribution  of  M(t)  is 
given  by  the  well  known  formula 


P(M(t)=n) 


n=0,l,2,... 


Experience  has  shown  that  often  the  rate  of  faults  in  software  increases  initially  before  eventually 
decreasing,  and  so  in  Goel  (1983)  the  model  was  modified  to  account  for  this  by  letting 


A(t)  =  abc  t0'1  e'btC 


where  a  is  still  the  total  number  of  bugs  and  b  and  c  describe  the  quality  of  testing. 


9.  Logarithmic  Poisson  Execution  Time  Model  (Musa  and  Okumoto  (1984)). 

The  Logarithmic  Poisson  Execution  Time  Model  of  Musa  and  Okomuto  is  one  of  the  more  popular 
software  failure  models  of  recent  years.  It  is  a  type  II  model,  but  the  model  is  not  derived  by  directly 
assuming  some  intensity  function  A(t),  as  was  the  case  with  model  8  of  Goel  k  Okumoto.  Here  A(t)  is 
expressed  in  terms  of  /i(t),  the  expected  number  of  failures  in  time  [0,t),  via  the  relationship 


A(t)  —  A0  e 


,-Mt) 


Put  simply,  this  relationship  encapsulates  the  belief  that  the  intensity  (or  rate)  of  failures  at  time  t 
decreases  exponentially  with  the  number  of  failures  experienced,  and  so  bugs  fixed,  up  to  time  t.  The 
fixing  of  earlier  failures  will  reduce  A(t)  more  than  the  fixing  of  later  ones.  Since  we  are  modeling  the 
number  of  failures  by  a  Poisson  process,  then  we  have  another  relationship  between  A(t)  and  fi( t), 

namely 

t 

A*(t)  =  |  A(s)  ds  . 

0 

Using  these  two  relationships  between  A(t)  and  * x(t ),  there  is  a  unique  solution  for  the  two  functions: 

=  ;  rft)  =  $  MVt+l)  ■ 
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Figure  2  (vi)  shows  a  plot  of  A(t)  versus  t;  it  is  similar  to  the  plot  of  figure  2  (v)  except  that  the  tail  is 
thicker. 


It  now  follows  from  the  above  that  by  using  P(M(t)=n)  =  (f*( t))ne  ^  Vn!  we  can  say 


P(M(t)=n) 


(In(A0flt+l))n 
en  (A^t+l)1^  n!  ’ 


n=0,l,2.... 


As  a  final  remark,  we  mention  that  in  their  paper  the  authors  go  into  some  detail  on  estimation  of  A0 
and  9  by  maximum  likelihood  methods;  however,  one  of  the  likelihoods  appears  to  be  incorrect. 


10.  Random  Coefficient  Autoregressive  process  model  (Singpurwalla  &  Soyer  (1985)). 

This  is  a  Type  1-2  model,  that  is  one  that  does  not  consider  the  failure  rate  of  times  between  failure. 
Instead  it  assumes  that  there  is  some  pattern  between  successive  failure  times  and  that  this  pattern  can 
be  described  by  a  functional  relationship  between  them.  The  authors  declare  this  relationship  to  be  of 

the  form 


Tj  =  Tj./5  ,  i=l,2,3,... 

where  T0  is  the  time  to  the  first  failure  and  0-  is  some  unknown  coefficient.  If  all  the  9{ s  are  bigger 
than  1  then  we  expect  successive  lifelengths  to  increase,  and  if  all  the  0,’s  are  smaller  than  1  we  expect 

successive  lifelengths  to  decrease. 

Uncertainty  in  the  above  relationship  is  expressed  via  an  error  term  6-t,  so  that 

T;  =  S-,  Tj./i  . 

The  authors  then  make  the  following  assumptions,  which  greatly  facilitate  the  analysis  of  this  model. 
They  assume  the  T;’s  to  be  lognormally  distributed,  that  is  to  say  that  log  T;’s  have  a  normal 
distribution,  and  that  they  are  all  scaled  so  that  T;  >  1.  The  s  are  also  assumed  to  be  lognormal, 
with  median  1  and  variance  a*  (the  conventional  notation  is  A(1,£t12)).  Then  by  taking  logs  on  the 

relationship  above  they  obtain 
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log  T;  =  0;  log  Tj.j  +  log  6i 


=  0;  log  Tj.j  +  £;  ,  say. 

Since  the  T;’s  and  the  0;’s  are  lognormal  so  the  log  Tj’s  and  the  f;’s  (=  log  0;’s)  will  be  normally 
distributed,  and  in  particular  Cj  has  mean  0  and  some  variance  c2  (the  conventional  notation  is 
X(0,<r2)).  The  log-lifelengths  therefore  form  what  is  known  as  an  autoregressive  process  of  order  1  with 
random  coefficients  0j.  There  is  an  extensive  literature  on  such  processes  which  can  now  be  used  on  this 
model. 

All  that  remains  to  do  is  to  specify  0;,  and  the  authors  consider  several  alternative  models.  For 
example,  one  could  make  0;  itself  an  autoregressive  process  : 

0.  =  00;^+  u>j  ,  where  u>j  is  Jf(0,  Wj)  with  W;  known. 

When  q  is  known,  the  expressions  for  log  Tj  and  0;  together  form  a  Kalman  filter  model ,  on  which 
there  is  also  an  extensive  literature.  When  a  is  not  known  the  solution  is  via  an  adaptive  Kalman  filter 
algorithm  for  which  the  above  authors  propose  an  approach.  As  an  alternative  to  the  above,  one  could 
place  a  two  stage  distribution  on  0j,  and  the  authors  considered  the  idea  of  0-i  being  Jf(A,  <r22),  with  A 
also  a  normal  random  variable  having  mean  m0  and  variance  s02.  In  this  latter  case  one  can  employ 
standard  hierarchical  Bayesian  inference  techniques  to  predict  future  reliability  in  the  light  of  previous 
failure  data. 

Figure  2  shows  the  various  failure  rates  for  models  1,  2,  3  and  5,  and  the  intensity  function  for  models 
8  and  9. 
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13 


t 

Figure  2  (vi)  The  intensity  function  for  the  model  of  Musa  and  Okumoto 
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4.  Model  Unification. 


By  adopting  a  Bayesian  approach,  it  turns  out  that  one  can  unify  models  1,  2  and  8  -  the  models  by 
Jelinski  k  Moranda,  Littlewood  k  Verrall  and  Goel  k  Okomuto  respectively  -  under  a  general 
framework.  Observe  that  this  also  provides  a  link  between  the  two  types  of  models,  since  models  1  and 
2  are  of  type  I  whilst  model  8  is  of  type  II. 

We  begin  by  recalling  the  first  model,  that  by  Jelinski  k  Moranda.  Each  T,  is  assumed  to  have  a 
constant  failure  rate  A(N-i+l).  It  is  well  known  that  this  implies  each  Tj  must  therefore  be 
exponentially  distributed  with  mean  (A(N-i+l))'1.  Now  assume  that  A  and/or  N  is  unknown;  m  true 
Bayesian  fashion  prior  distributions  are  placed  upon  them. 

To  obtain  model  8  by  Goel  k  Okomuto,  we  let  A  be  degenerate  at  A  and  N  have  a  Poisson  distribution 
with  mean  0.  One  can  calculate  M(t)  using  the  T/s  as  defined  by  Jelinski  k  Moranda,  and  then  by 
averaging  out  over  N  one  finds  that  M(t)  is  indeed  a  Poisson  process  with  mean  : 

A<(t)  =  0  (l-e'At) 

which  is  the  form  of  p(t)  for  Goel  k  Okomuto’s  model. 

One  can  also  obtain  model  2  by  assuming  N  to  be  degenerate  and  A  to  have  a  gamma  distribution. 
The  derivations  which  lead  to  the  above  are  complex;  readers  are  referred  to  Langberg  and 
Singpurwalla  (1985)  for  the  details. 
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5.  An  application  :  optimal  testing  of  software. 

The  failure  models  that  have  been  reviewed  in  the  preceding  sections  can  be  used  for  more  than 
inference  or  the  prediction  of  software  failure.  They  can  also  be  applied  in  the  framework  of  decision 
theory  to  solve  decision  problems.  An  important  example  of  such  a  problem  is  the  optimal  time  to  test 
software  before  releasing  it.  This  involves  the  balancing  of  the  costs  of  testing  and  the  risk  of  software 
obsolesence  with  the  cost  of  in-service  failure,  should  a  bug  not  be  corrected  during  the  testing  period. 
The  following  is  taken  from  Singpurwalla  (1991),  in  which  a  strongly  Bayesian  approach  is  taken. 

To  implement  a  decision  theoretic  procedure  requires  two  key  ingredients.  The  first  is  a  probability 
model,  and  here  we  take  a  generalization  of  the  Jelinski  &  Moranda  model.  The  second  is  a 
consideration  of  the  costs  and  benefits,  or  utilities ,  associated  with  a  particular  decision  i.e  the  costs  of 
testing,  the  benefits  and  costs  of  fixing  a  bug  etc.  Decision  theory  states  that  the  optimal  decision  (in 
this  case  time  of  test)  is  that  which  maximizes  expected  utility. 

If  the  software  is  to  be  tested  for  some  time,  say  T  units,  and  then  released  the  problem  is  to  find  a  T 
that  maximizes  expected  utility.  This  is  called  single  stage  testing.  There  is  a  more  complex,  yet 
realistic,  scenario  called  two  stage  testing.  Here  the  software  is  tested  for  T  units  of  time,  and  then 
depending  on  how  many  failures  M(T)  were  observed  during  that  test,  a  decision  is  made  on  whether  to 
continue  testing  for  a  further  T*  units.  The  problem  here  is  to  find  the  optimal  T  and  T*,  with  T*  to 
be  determined  before  M(T)  is  observed.  Finally  there  is  a  third  testing  scenario,  namely  sequential 
testing.  Here  T*  is  determined  after  M(T)  is  observed;  this  procedure  can  continue  for  several  stages, 
with  T**  being  determined  after  M(T*)  is  observed  and  so  on.  Here  we  consider  the  case  of  single  stage 
testing.  Figure  3  is  a  graph  of  the  decision  process  associated  with  single  stage  testing. 


Figure  3  Decision  process  for  single-stage  testing 
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The  model  chosen  in  this  paper  is  an  extension  to  Jelinski  tc  Moranda’s  model.  We  have 

fT  (t  |  N,  A)  =  A(N— i+1)  e~A(N'1+1)t  t  >  0 

In  the  previous  section  we  placed  prior  distributions  on  one  of  N  or  A.  Now  we  place  priors  on  both  the 
parameters,  and  say  That  N  has  a  Poisson  distribution  with  mean  6,  A  has  a  gamma  distribution  with 
scale  fi  and  shape  a  and  that  N  and  A  are  independent. 

We  now  turn  to  the  choice  of  utility  function.  The  following  assumptions  are  made  : 

-a-J 

i)  The  utility  of  a  program  that  encounters  j  bugs  during  its  operation  is  ax+  e 

ii)  The  cost  of  fixing  a  bug  is  some  constant  Cx. 

iii)  Let  f(T)  be  the  cost  of  testing  and  lost  opportunity  to  time  t;  here  we  say  f(T)  =  dT* 

Note  from  i)  that  the  utility  of  a  bug-free  program  is  ax+  a^,  and  the  utility  of  a  program  with  a  very 
large  number  of  bugs  is  near  ax,  so  that  typically  ax  is  a  large  negative  number  (because  there  is  a 
great  loss  associated  with  software  that  is  constantly  failing  in  the  marketplace)  and  a^O.  Combining 
these  assumptions  gives  us  the  utility  of  a  program  that  is  tested  for  T  units  of  time,  during  which 
M(T)  bugs  are  found  and  corrected,  and  then  released  where  j  bugs  are  encountered  by  the  customer  as 

«U(T,  M(T),  j)  =  e"bT  x  {a1+  a,  e' *3»  -  CXM(T)  -  dT*} 

where  e'b^  is  some  devaluating  factor. 

Now  the  two  parts  of  the  decision  process  -  the  probability  model  and  the  utility  function  -  are  brought 
together.  We  wish  to  find  the  time  T  that  maximizes  expected  utility.  In  other  words  find  T  such  that 
E(CU(T,  M(T),  j))  is  a  maximum,  where  we  take  expectation,  using  our  failure  model,  with  respect  to 
M(T)  and  j.  This  maximization  is  quite  complex,  and  must  be  done  numerically  via  computer.  The 
details  are  found  in  the  paper,  but  the  end  result  is  best  displayed  as  a  graph  of  time  against  expected 
utility  (figure  4);  in  this  case  one  can  see  that  the  time  one  should  test  the  software  for  is  about  3.5 
units. 


17 


Expected  utility 
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89.332 
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Figure  4  Time  of  testing  versus  expected  utility  for  the  model  in  Section  5 


time 
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6.  Conclusion. 


This  paper  has  attempted  to  review  the  main  methods,  and  some  of  the  more  well  known  models,  that 
have  been  used  by  the  statistics  community  in  the  area  of  software  reliability.  The  first  models  were 
almost  always  based  on  looking  at  the  failure  rate  of  the  software;  later  on  the  idea  of  modeling 
number  of  failures  by  a  Poisson  process  was  used  and  then  most  recently  auto-regressive  processes  have 
been  suggested  as  an  alternative  to  the  failure  rate  method.  Application  of  the  failure  models,  such  as 
to  the  optimal  testing  decision  problem,  is  another  important  aspect  to  the  field. 

Earlier  it  was  pointed  out  that  there  is  almost  no  data  on  the  reliability  of  commercial  software,  due  to 
the  sensitive  nature  of  that  information.  A  possible  method  of  overcoming  this  problem  would  be  to 
have  more  interaction  between  the  statistics  and  computer  science  communities.  In  the  future,  such 
interaction  seems  essential  if  models  are  to  become  more  realistic  and  useful,  and  it  is  perhaps 
surprising  that  there  are  so  few  links  between  the  two  groups  today. 

There  still  remains  much  to  be  researched  in  this  field.  In  the  case  of  optimal  testing,  plans  for  two- 
stage  and  sequential  testing  need  to  be  developed,  whilst  the  verification  of  current  and  future  models 
is  likely  to  remain  a  problem.  Nevertheless,  because  of  the  increasing  presence  of  computers  in  all 
aspects  of  our  daily  lives,  the  topic  of  software  reliability  can  only  become  more  important  in  the 
future. 
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